124 research outputs found
Bridge the Gap Between VQA and Human Behavior on Omnidirectional Video: A Large-Scale Dataset and a Deep Learning Model
Omnidirectional video enables spherical stimuli with the viewing range. Meanwhile, only the viewport region of omnidirectional
video can be seen by the observer through head movement (HM), and an even
smaller region within the viewport can be clearly perceived through eye
movement (EM). Thus, the subjective quality of omnidirectional video may be
correlated with HM and EM of human behavior. To fill in the gap between
subjective quality and human behavior, this paper proposes a large-scale visual
quality assessment (VQA) dataset of omnidirectional video, called VQA-OV, which
collects 60 reference sequences and 540 impaired sequences. Our VQA-OV dataset
provides not only the subjective quality scores of sequences but also the HM
and EM data of subjects. By mining our dataset, we find that the subjective
quality of omnidirectional video is indeed related to HM and EM. Hence, we
develop a deep learning model, which embeds HM and EM, for objective VQA on
omnidirectional video. Experimental results show that our model significantly
improves the state-of-the-art performance of VQA on omnidirectional video.Comment: Accepted by ACM MM 201
Experimental and CFD Study of Flow Phenomenon in Flowrate-amplified Flotation Element
Focusing on reducing the air consumption of an air flotation rail system, a flowrate-amplified flotation element was recently developed. This new flotation element ulitises the rotational flow to intake extra air via an intake hole, and thus, effectively improves the flotation height. Compared to a conventional flotation element, the flowrate-amplified flotation element can reduce air consumption by approximately 50% for the same load and flotation height. To gain an understanding of the flow phenomenon in the flowrate-amplified flotation element, experiments and CFD simulations are conducted in this study. Based on the results, we found that the flowrate-amplified flotation element could take a part of the kinetic energy of the rotating air to suck in extra air. The intake hole greatly affects the pressure field and velocity field of the flotation element. Additionally, the effects of the variant gap height and supplied flow rate were also discussed. The results indicate that the pressure distribution decreases as the gap height increases and increases as the supplied flow rate increases
Toward Linearizability Testing for Multi-Word Persistent Synchronization Primitives
Persistent memory makes it possible to recover in-memory data structures following a failure instead of rebuilding them from state saved in slow secondary storage. Implementing such recoverable data structures correctly is challenging as their underlying algorithms must deal with both parallelism and failures, which makes them especially susceptible to programming errors. Traditional proofs of correctness should therefore be combined with other methods, such as model checking or software testing, to minimize the likelihood of uncaught defects. This research focuses specifically on the algorithmic principles of software testing, particularly linearizability analysis, for multi-word persistent synchronization primitives such as conditional swap operations. We describe an efficient decision procedure for linearizability in this context, and discuss its practical applications in detecting previously-unknown bugs in implementations of multi-word persistent primitives
BInGo: Bayesian Intrinsic Groupwise Registration via Explicit Hierarchical Disentanglement
Multimodal groupwise registration aligns internal structures in a group of
medical images. Current approaches to this problem involve developing
similarity measures over the joint intensity profile of all images, which may
be computationally prohibitive for large image groups and unstable under
various conditions. To tackle these issues, we propose BInGo, a general
unsupervised hierarchical Bayesian framework based on deep learning, to learn
intrinsic structural representations to measure the similarity of multimodal
images. Particularly, a variational auto-encoder with a novel posterior is
proposed, which facilitates the disentanglement learning of structural
representations and spatial transformations, and characterizes the imaging
process from the common structure with shape transition and appearance
variation. Notably, BInGo is scalable to learn from small groups, whereas being
tested for large-scale groupwise registration, thus significantly reducing
computational costs. We compared BInGo with five iterative or deep learning
methods on three public intrasubject and intersubject datasets, i.e. BraTS,
MS-CMR of the heart, and Learn2Reg abdomen MR-CT, and demonstrated its superior
accuracy and computational efficiency, even for very large group sizes (e.g.,
over 1300 2D images from MS-CMR in each group)
DPATD: Dual-Phase Audio Transformer for Denoising
Recent high-performance transformer-based speech enhancement models
demonstrate that time domain methods could achieve similar performance as
time-frequency domain methods. However, time-domain speech enhancement systems
typically receive input audio sequences consisting of a large number of time
steps, making it challenging to model extremely long sequences and train models
to perform adequately. In this paper, we utilize smaller audio chunks as input
to achieve efficient utilization of audio information to address the above
challenges. We propose a dual-phase audio transformer for denoising (DPATD), a
novel model to organize transformer layers in a deep structure to learn clean
audio sequences for denoising. DPATD splits the audio input into smaller
chunks, where the input length can be proportional to the square root of the
original sequence length. Our memory-compressed explainable attention is
efficient and converges faster compared to the frequently used self-attention
module. Extensive experiments demonstrate that our model outperforms
state-of-the-art methods.Comment: IEEE DD
- …